ProjectWise Administrator Help

Document Processor Properties Dialog

Used to:

  • enable a document processor to perform extractions (required to use the feature)
  • set up a schedule for when a document processor will perform extractions (recommended, but optional)
  • configure file type extension mappings for a document processor (recommended, but optional)

This dialog opens when you right-click one of the following document processors and select Properties:

  • Full Text Indexing
  • Thumbnail Extraction
  • File Property Extraction

General tab

SettingDescription
Index Server (Full Text Indexing only)

Sets the computer on which the full text index catalog for this datasource will be maintained. This list displays the list of indexing services registered with this ProjectWise Integration Server.

Extraction Enabled / Indexing Enabled Turn this setting on to enable this document processor to perform extractions. When you enable extractions, you must also specify the ProjectWise user account that the processor will use.

(Extraction Enabled is the label for Thumbnail Extraction and File Property Extraction. Indexing Enabled is the label for Full Text Indexing.)

Select user and generate login token Click this button to select a service account with non-expiring credentials that the document processor will use to log in to the datasource and access the documents it needs to process.

The user you select is added to the ProjectWise user field, and:

  1. Can be configured to use any type of authentication.
  2. Must have these user settings turned on:
    • General > Credential expiration policy > No expiration
    • General > Enable as Service Account
    • General > Use access control (not required, but recommended)
Retry extraction in (minutes)

In general, the value for this option should be long enough for the extraction process to complete in stress load conditions, but not too long, in the event of some processing failure. If there is a failure in processing, the processing will not start again until the end of the Retry period. That means if a failure occurs early on, for example at the tenth minute of processing, nothing will happen for the next 350 minutes, until processing is restarted.

The default value is 180, and the possible range you can set is between 10 and 1440 minutes (24 hours).

Ideal values for this field can vary from datasource to datasource, so you may want to leave the default value as is, and then adjust it accordingly based on performance results. Or, try setting a value twice bigger than a time anticipated for a single document processing under stress, or value between half an hour and an hour may be a good starting point.

Max documents processed in a single pass

Limits the amount of documents that may be passed to processing in a single document inspection pass. For large datasources, the extraction engine will process that many documents at a time, until all documents to be processed have been processed. If extractions are enabled and later you select Start Processing Now, then no more than the number of documents specified in this field will be processed at that time.

The default value is 100, and the lowest value you can set here is 1.

Skip schedule for document property indexing (Full Text Indexing only)

When text indexing is enabled and this setting is off, text extractions will run according to the defined schedule. This is the default behavior, and this is how text indexing has worked in general, prior to the introduction of this setting.

When text indexing is enabled and this setting is on, text extractions for file contents will still run according to the schedule, however text extractions for document properties will run according to the setting, Retry extraction in (minutes). For example, if your text extractions are normally set to retry every 30 minutes, then every 30 minutes the text indexing processor will search for new documents to extract document property text from, even during periods when the text indexing processor is scheduled to be sleeping.

Tip: Whether this setting is on or off, text extractions for file contents will always run according to the defined schedule.

Scheduled Updates tab

Used to set up an extraction schedule.

SettingDescription
Time table
Run
Sleep
Check for updated documents every (minutes) Defines the amount of time to lapse from the end of one inspection for new or updated documents, to the beginning of the next inspection.

File Type Associations tab

SettingDescription
Application
Extension
Do not process these documents
Process the files as if they have the following extension